Various datasets have been proposed for simultaneous localization and mapping (SLAM) and related problems. Existing datasets often include small environments, have incomplete ground truth, or lack important sensor data, such as depth and infrared images. We propose an easy-to-use framework for acquiring building-scale 3D reconstruction using a consumer depth camera. Unlike complex and expensive acquisition setups, our system enables crowd-sourcing, which can greatly benefit data-hungry algorithms. Compared to similar systems, we utilize raw depth maps for odometry computation and loop closure refinement which results in better reconstructions. We acquire a building-scale 3D dataset (BS3D) and demonstrate its value by training an improved monocular depth estimation model. As a unique experiment, we benchmark visual-inertial odometry methods using both color and active infrared images.
translated by 谷歌翻译
Understanding 3D environments semantically is pivotal in autonomous driving applications where multiple computer vision tasks are involved. Multi-task models provide different types of outputs for a given scene, yielding a more holistic representation while keeping the computational cost low. We propose a multi-task model for panoptic segmentation and depth completion using RGB images and sparse depth maps. Our model successfully predicts fully dense depth maps and performs semantic segmentation, instance segmentation, and panoptic segmentation for every input frame. Extensive experiments were done on the Virtual KITTI 2 dataset and we demonstrate that our model solves multiple tasks, without a significant increase in computational cost, while keeping high accuracy performance. Code is available at https://github.com/juanb09111/PanDepth.git
translated by 谷歌翻译
整体场景的理解对于自动机器的性能至关重要。在本文中,我们提出了一个新的端到端模型,用于共同执行语义细分和深度完成。最近的绝大多数方法已发展为独立任务的语义细分和深度完成。我们的方法取决于RGB和稀疏深度作为我们模型的输入,并产生密集的深度图和相应的语义分割图像。它由特征提取器,深度完成分支,语义分割分支和联合分支组成,该分支进一步处理语义和深度信息。在Virtual Kitti 2数据集上进行的实验,证明并提供了进一步的证据,即在多任务网络中将两个任务,语义细分和深度完成都结合在一起,可以有效地提高每个任务的性能。代码可从https://github.com/juanb09111/smantic Depth获得。
translated by 谷歌翻译
本文的目的是评估图像分类任务的解释热图的质量。为了评估解释性方法的质量,我们通过准确性和稳定性的角度来处理任务。在这项工作中,我们做出以下贡献。首先,我们介绍了加权游戏,该游戏衡量了正确的类“分割掩码中包含的类别引导的解释”。其次,我们使用缩放/平移变换引入了用于解释稳定性的度量,以测量具有相似内容的显着性图之间的差异。使用这些新指标生产定量实验,以评估常用CAM方法提供的解释质量。解释的质量在不同的模型体系结构之间也形成了鲜明对比,发现突出了选择在选择解释性方法时考虑模型体系结构的必要性。
translated by 谷歌翻译
我们提出了HRF-NET,这是一种基于整体辐射场的新型视图合成方法,该方法使用一组稀疏输入来呈现新视图。最近的概括视图合成方法还利用了光辉场,但渲染速度不是实时的。现有的方法可以有效地训练和呈现新颖的观点,但它们无法概括地看不到场景。我们的方法解决了用于概括视图合成的实时渲染问题,并由两个主要阶段组成:整体辐射场预测指标和基于卷积的神经渲染器。该架构不仅基于隐式神经场的一致场景几何形状,而且还可以使用单个GPU有效地呈现新视图。我们首先在DTU数据集的多个3D场景上训练HRF-NET,并且网络只能仅使用光度损耗就看不见的真实和合成数据产生合理的新视图。此外,我们的方法可以利用单个场景的密集参考图像集来产生准确的新颖视图,而无需依赖其他明确表示,并且仍然保持了预训练模型的高速渲染。实验结果表明,HRF-NET优于各种合成和真实数据集的最先进的神经渲染方法。
translated by 谷歌翻译
本文介绍了一个有效的对称性和无对应框架,称为SC6D,对于单个单眼RGB图像的6D对象姿势估计。SC6D既不需要对象的3D CAD模型,也不需要对称对称的任何先验知识。姿势估计分解为三个子任务:a)对象3D旋转表示学习和匹配;b)估计对象中心的2D位置;和c)通过分类的比例不变距离估计(沿Z轴的翻译)。SC6D在三个基准数据集(T-less,YCB-V和ITODD)上进行了评估,并在T-less数据集中获得最先进的性能。此外,SC6D在计算上比以前的最新方法Surfemb更有效。实施和预培训模型可在https://github.com/dingdingcai/sc6d-pose上公开获得。
translated by 谷歌翻译
本文着重于使用回声和RGB图像来感知和导航3D环境。特别是,我们通过将RGB图像与回声融合来执行深度估计,并从多个方向收到。与以前的作品不同,我们超越了RGB的视野,并估算了大量较大环境的密集深度图。我们表明,回声提供了有关补充RGB图像的3D结构的整体且廉价的信息。此外,我们研究了如何在机器人导航中使用回声和广泛的视野深度图。我们使用两组具有挑战性的现实3D环境(副本和Matterport3D)将提出的方法与最近的基线进行比较。将公开提供实施和预培训模型。
translated by 谷歌翻译
神经图像编码现在表示现有的图像压缩方法。但是,在视频域中仍有很多工作。在这项工作中,我们提出了一部结束了学习的视频编解码器,介绍了几个建筑Noveltize以及培训Noveltizes,围绕适应和关注的概念。我们的编解码器被组织为与帧间编解码器配对的帧内编解码器。作为一种建筑新颖,我们建议培训帧间编解码器模型以基于输入视频的分辨率来调整运动估计处理。第二个建筑新奇是一种新的神经块,它将基于分裂的神经网络和Densenets的概念结合了。最后,我们建议在推理时间内过度装备一组解码器侧乘法参数。通过消融研究和对现有技术的比较,我们在编码收益方面表现出我们所提出的技术的好处。我们将编解码器与VVC / H.266和RLVC进行比较,该rlvc分别代表最先进的传统和端到端学习的编解码器,并在2021年在2021年在2021年执行端到端学习方法竞争,e2e_t_ol。我们的编解码器显然优于E2E_T_OL,并在某些设置中对VVC和RLVC有利地进行比较。
translated by 谷歌翻译
我们呈现HYBVIO,一种新的混合方法,用于利用基于优化的SLAM结合基于滤波的视觉惯性内径术(VIO)的混合方法。我们的方法的核心是强大的,独立的VIO,具有改进的IMU偏置建模,异常值抑制,实体性检测和特征轨道选择,可调于在嵌入式硬件上运行。使用松散耦合的SLAM模块实现了长期一致性。在学术基准中,我们的解决方案在所有类别中产生了出色的性能,特别是在实时用例中,我们优于最新的最先进。我们还展示了VIO使用自定义数据集对消费类硬件的车辆跟踪的可行性,并与当前商业诉讼替代品相比,表现出良好的性能。https://github.com/spectacularai/hybvio提供了Hybvio方法的开源实现
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译